Policy-Gradient for Robust Planning
نویسندگان
چکیده
Real-world Decision-Theoretic Planning (DTP) is a very challenging research field. A common approach is to model such problems as Markov Decision Problems (MDP) and use dynamic programming techniques. Yet, two major difficulties arise: 1dynamic programming does not scale with the number of tasks, and 2the probabilistic model may be uncertain, leading to the choice of unsafe policies. We build here on Policy Gradient algorithms to address the first difficulty and on robust decision-making to address the second one through algorithms that train competing learning agents. The first agent learns the plan while the second learns the model most likely to upset the plan. It is known from gradient-based game theory that at least one player may not converge, so we focus on convergence of the robust plan only, using non-symmetric algorithms.
منابع مشابه
Stochastic Motion Planning for Hopping Rovers on Small Solar System Bodies
Hopping rovers have emerged as a promising platform for the future surface exploration of small Solar System bodies, such as asteroids and comets. However, hopping dynamics are governed by nonlinear gravity fields and stochastic bouncing on highly irregular surfaces, which pose several challenges for traditional motion planning methods. This paper presents the first ever discussion of motion pl...
متن کاملA Two-Teams Approach for Robust Probabilistic Temporal Planning
Large real-world Probabilistic Temporal Planning (PTP) is a very challenging research field. A common approach is to model such problems as Markov Decision Problems (MDP) and use dynamic programming techniques. Yet, two major difficulties arise: 1dynamic programming does not scale with the number of tasks, and 2the probabilistic model may be uncertain, leading to the choice of unsafe policies. ...
متن کاملEquilibrium Policy Gradients for Spatiotemporal Planning
In spatiotemporal planning, agents choose actions at multiple locations in space over some planning horizon to maximize their utility and satisfy various constraints. In forestry planning, for example, the problem is to choose actions for thousands of locations in the forest each year. The actions at each location could include harvesting trees, treating trees against disease and pests, or doin...
متن کاملGradient - based Reinforcement Planning in Policy - Search
We introduce a learning method called “gradient-based reinforcement planning” (GREP). Unlike traditional DP methods that improve their policy backwards in time, GREP is a gradient-based method that plans ahead and improves its policy before it actually acts in the environment. We derive formulas for the exact policy gradient that maximizes the expected future reward and confirm our ideas with n...
متن کاملar X iv : c s . A I / 01 11 06 0 v 1 28 N ov 2 00 1 Gradient - based Reinforcement Planning in Policy
We introduce a learning method called “gradient-based reinforcement planning” (GREP). Unlike traditional DP methods that improve their policy backwards in time, GREP is a gradient-based method that plans ahead and improves its policy before it actually acts in the environment. We derive formulas for the exact policy gradient that maximizes the expected future reward and confirm our ideas with n...
متن کامل